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BATCH-ORIENTED SOFTWARE APPLIANCES 



Abstract. This paper presents AppPot, a system for creating Linux software 
appliances. AppPot can be run as a regular batch or grid job and executed in 
user space, and requires no special virtualization support in the infrastructure. 

The main design goal of AppPot is to bring the benefits of a virtualization- 
based IaaS cloud to existing batch-oriented computing infrastructures. 

In particular, AppPot addresses the application deployment and configu- 
ration on large heterogeneous computing infrastructures: users are enabled to 
prepare their own customized virtual appliance for providing a safe execution 
environment for their applications. These appliances can then be executed on 
virtually any computing infrastructure being in a private or public cloud as 
well as any batch-controlled computing clusters the user may have access to. 

We give an overview of AppPot and its features, the technology that makes 
it possible, and report on experiences running it in production use within the 
Swiss National Grid infrastructure SMSCG. 

r \', 1- Introduction 

f*\ • Application deployment and configuration on large heterogeneous systems is a 

complex infrastructure issue that requires coordination among various system ad- 
ministrators, end users as well as operation teams. This is further complicated 
when it comes to scientic applications that are, most of the time, not supported on 
many Linux distributios. 

^. | Virtualized infrastructures and software appliances provide an effective solution 

V^JQ ■ to these problems but do require a specific infrastructure and a usage model that 

^sO ' is markedly different form the batch-oriented processing that is still the backbone 

of scientific computing. 

This paper presents a system (nicknamed "AppPot") to bring the benefits of 
virtualization-based IaaS clouds to existing batch-oriented computing infrastruc- 
tures. 

AppPot comes in the form of a set of POSIX shell scripts that are installed into 
a GNU/Linux system image, and modify its start-up sequence to allow controlling 
task execution via the kernel boot command-line. Another utility is provided to 
invoke an AppPot system image from the shell command line, and request it to exe- 
cute a given command. The software is freely available from http://code.google.eom/p/apppot. 

This combination effectively turns AppPot into a technology for constructing 
software appliances that can also run as batch jobs in a local cluster or grid com- 
puting system. Pairing this with the User- Mode Linux virtualization system [6, 21], 
AppPot appliances require (almost) no support from grid and cluster systems ad- 
ministrators. This effectively allows use cases that have so far made IaaS cloud 
infrastructures attractive for end-users. 

The rest of the paper is organized as follow: Section 2 explains the motivation 
behind the project; Section 3 presents typical usecases and the main restrictions 
that AppPot helps addressing. Section 4 recaps the main features of User-Mode 
Linux (UML), the virtualization technology that allows AppPot to run without 
"root" user privileges. Section 5 presents the architectural details as well as the 
functional specifications. Section 6 elaborates on how the presented use cases were 
implemented using AppPot, and reports on the observed limitations. 
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2 BATCH-ORIENTED SOFTWARE APPLIANCES 

2. MOTIVATIONS 

A problem that has traditionally plagued batch computing infrastructures is 
the deployment of software applications: in a centrally-administered system, all 
requests for software installation must be acted upon by the systems administra- 
tor. This is less of an issue in local computing clusters, where users can usually 
freely install software in their own home directories, but scales up to a significant 
administration and communication problem in a large computational grid. 

In addition, some software packages (notably, many scientific codes) require com- 
plex installation procedures or provide scarce documentation, so that specific ex- 
pertise is needed to properly install and configure the application. In a grid in- 
frastructure, this poses an organizational scalability problem again: all systems 
administrators must be conversant with the installation procedures of every soft- 
ware piece. Indeed, this issue has been tackled by many grid infrastructures, either 
by providing access only to a fixed restricted set of software applications, or by 
requiring clearance procedures to elevate a subset of the users ("VO operators") 
to the privilege level needed to install software on the execution nodes. The first 
solution restricts the possibilities of users to exploit the infrastructure to its full 
potential; the second one burdens end-users with additional tasks. 

While the installation of publicly-available software packages is generally nego- 
tiable in some way, central administration of software becomes overly impractical 
when users need to deploy an application they are developing themselves. In this 
case, the code changes very frequently, and it is just not feasible to issue an instal- 
lation request for every revision. Still, users might need to execute validation tests 
and regression suites, which in the case of scientific applications can take hours to 
run; this is indeed a perfect use case for batch jobs. 

Leveraging the User-Mode Linux virtualization system, AppPot appliances can 
run on grid and local clusters as regular batch system jobs, without the need for 
sysadmin support or root access. This solves both the aforementioned problems: 

• AppPot software appliances are a way to implement generic application de- 
ployment on a computational grid, and especially to enable users to provide 
their own software to the computing cluster: a complete AppPot appliance 
consists of three files, that can be copied to the execution cluster with any 
available mechanism, including the "stage in" facilities provided by most 
grid and cluster batch systems. 

• Users can use an AppPot Virtual Machine (VM) on their computer for cod- 
ing, and then run the same VM as a Grid jobs or in a Cloud Infrastructure 
as a Service (IaaS) infrastructure for larger tests and production runs. 

3. Goals and use cases 

The following scenarios are meant as an illustration of AppPot's intended use 
cases. Throughout the paper, we shall describe how the requirements from these 
use cases translate into design decisions for AppPot, and how well the goals have 
been met. 

3.1. Deployment of complex applications. Some software packages (notably, 
many scientific codes) require complex installation procedures. Typical problems 
may be roughly summarized in the following categories: 

(Rl) The application depends on software that is not readily available on the 

host operating system, or the application depends on software or a specific 

configuration that conflicts with other systems settings. 

(R2) The application has a complex or non-standard compilation procedure, and 

the documentation is scarce. For example, the installation procedure may 
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BATCH-ORIENTED SOFTWARE APPLIANCES 3 

require that some files are hand-edited and placed into specific locations, 
but the documentation is not clear about the content and format of these 
files, or on how to adapt the examples to new systems. 

In a grid infrastructure, this poses an additional problem of scale: all systems 
administrators must be conversant with the installation procedures of every software 
piece, and every application must be compatible with all the computing systems 
available in the infrastructure. 

It is well-known how virtual appliances allow solving these problems; however, 
virtualization systems that require "root" privileges for operation pose a security 
problem, since the attack surface of an appliance is larger than the one of a unpriv- 
ileged process. This translates into the following requirement: 

(R3) Ensure that deployed appliances cannot do harm to the system. 1 (See, e.g., 
[14]) 

3.2. Running self-developed code. A large fraction of research groups are de- 
veloping their own software applications;oftentimes for computational experiments 
that are ephemeral, or limited in scope to a local group or niche community. 

All the problems illustrated in the previous use case still apply, and a few addi- 
tional features have to be considered here. Namely, small updates to the software 
appliance are frequent and further progress in the code may depend on the outcome 
of the tests run on the updated appliance, so: 

(R4) It should be easy for users to prepare appliances to run different versions 

and branches of the same software, in order to experiment with algorithmic 

variants. 
(R5) It should be easy for users to add, remove and change software dependencies 

from the appliance, as the code grows and evolves. 
(R6) The deployment procedure should not involve any one else than the code 

author: it should be completely controlled and initiated by the user. 
(R7) The deployment procedure should be as fast as possible not to interfere 

with the software write/test cycle. 

4. User-Mode Linux 

The fundamental ingredient for AppPot is the User-Mode Linux virtualization 
system [6, 21]. We shall therefore recap here the main points of its architecture and 
the features that make it suitable for AppPot. 

User-Mode Linux (UML) consists of a modified Linux kernel (guest) that runs 
as a userspace process within another Linux system (host); being a regular Linux 
kernel in almost every other aspect, UML can run any Linux distribution with any 
configuration. The main difference of UML relative to other (para)virtualization 
solutions is that User-Mode Linux can only run a Linux guest inside Linux host: 
no other Operating System (OS) is supported. 

UML supports many of the features that make OS virtualization products at- 
tractive for building appliances. AppPot leverages the following ones: 

• Any file in the host system can be mapped to a block device in the guest 
system. UML has a copy-on-write feature, wherein all writes to the filesys- 
tem are written to a separate file, so that a single filesystem image can be 
shared by many concurrent UML instances. 



If users have shell access to a system, as is commonly the case with cluster installations, 
then they can already run arbitrary code, so the requirement practically weakens to ensuring that 
appliances cannot do more harm than a regular user process already can. 
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4 BATCH-ORIENTED SOFTWARE APPLIANCES 

• Portions of the host filesystem can be grafted in the guest filesystem, with 
full read/ write access. 

• The additional helper program slirp (see [19, 10]) enables use of the host's 
TCP/UDP networking from within the guest system without requiring 
special running privileges. 

• UML enforces limits on the resource usage: e.g., a UML system will use no 
more than the memory that has been assigned to it. 

UML ensures process and kernel address space separation in the guest system 
through a mechanism called "SKASO" (see Section "UML Execution Modes" on 
page 128 of [6] for details). 

5. Architecture and usage 

An AppPot appliance appears to a user as consisting of a few hies: (1) an AppPot 
disk image, which is a complete GNU/Linux system installed in a partition image 
file (in "raw" format); (2) an UML Linux kernel; (3) a shell script apppot- start 
used to run a command-line program within the AppPot appliance; (4) a few aux- 
iliary programs that enable optional features of AppPot (networking, I/O streams 
redirection). All these files can, all or in part, be installed system- wide so that 
many users can benefit from a shared installation. 

Most of the components of AppPot only interact during the Linux boot pro- 
cess. The sequence of steps taken by an AppPot appliance from invocation to the 
execution of a user-specified command is the following. 

1. Users invoke the apppot -start script, optionally specifying a command to 
run in the AppPot appliance. Command-line options allow to set the path to 
the raw disk image file and the UML kernel, or specify UML boot arguments like 
maximum virtual memory. 

2. The UML kernel — running as a user process — performs the normal Linux 
boot sequence, mounts the raw disk image file, and executes the startup scripts. 
The Linux console I/O is redirected to the UML process standard input and output 
streams. 

3. The apppot -init script is the last program executed as part of the boot 
sequence. In detail, it does the following: 

a. It reads the kernel boot command-line, and recognizes specially-formatted 
arguments put there by apppot-start. (For instance, the path to the 
"changes" archive file.) 

b. Mounts the current working directory of apppot-start in the VM filesys- 
tem, and uses it as a working directory for all subsequent steps. 

c. Alters the UNIX UID and GID of the regular user in the AppPot appliance 
to match those of the owner of the working directory in the host filesystem. 4 

d. If a "changes" archive file is specified, merges its contents into the currently- 
running VM filesystem. (See Section 6.2 for an more details and an exam- 
pie.) 

e. Commands to run non-interactively in the AppPot appliance can be given 
on the apppot-start command-line, or specified in a script named "apppot- 
run" which has to be placed in the host directory where AppPot is run. (For 



Normal access control on the host filesystem applies: for instance, a guest run by an unprivi- 
leged user cannot modify files that are owned by the super-user "root". 

''UML is actually extremely flexible in terms of network device support, which has made it very 
popular as a base for network simulation and teaching environments: for example, see [16, 11, 4] 

This is necessary as accesses to the host filesystem undergo ordinary access control by the 
host kernel; operations performed by processes running in the UML Virtual Machine appear as 
done by the user running the UML process in the host system. 
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BATCH-ORIENTED SOFTWARE APPLIANCES 5 

example, the "apppot-run" script can pre-process the input data and then 
execute the main application.) 

So, if a command is specified on the boot command line, apppot-init 
runs it; otherwise, it checks if a startup script exists in the appliance or in 
the working directory. 

If none of the above applies, apppot-init starts an interactive shell on 
the system console. 
/. When the command in the previous step has terminated, apppot-init 
initiates the shutdown sequence. 

Communication between the apppot- start script (invoked by the appliance 
users) and the apppot-init one (which drives the VM boot process) happens 
through the kernel boot parameters. This is a very generic mechanism, that can 
easily be extended to drive AppPot appliances with any virtualization Application 
Programming Interface (API) that supports some kind of argument passing between 
the caller and the VM. For instance, the "user-data" mechanism of the EC2 API 
could serve this purpose as well. 

The reference AppPot image is based on the stable release of the Debian distri- 
bution. Apart from being optimized for running a single-user task in a virtualized 
environment (i.e., multi-user support, hardware-related software and most daemons 
have been turned off, resulting in a total startup time of 9s), it is a regular Debian 
install. 

Note that, since the AppPot system image is a raw disk image file, it can be 
run through any virtualization software that can read this disk format (e.g., KVM 
or Xen); so it can also be used on infrastructures that support full virtualization 
(i.e. IaaS cloud), or on non-Linux hosts. On the other hand, running an AppPot 
appliance as a batch job through the apppot-start script is currently supported 
with UML only. While support for any virtualization system accessible with the 
libvirt API [12] could easily be added, starting a virtual machine is a privileged 
operation with most virtualization systems, and it would not be safe to allow any 
user to do that, since resource allocation is not handled by the hypervisor software. 5 

Data movement in- and out of the VM happens through the shared filesys- 
tem: in particular, this allows AppPot to run any command in a working directory 
that is shared with the host system, so that invocation of a command through 
apppot-start is completely transparent to the user. However, this mechanism is 
UML-specific; if a different virtualization system is to be used, alternatives should 
be implemented to move data in and out of the AppPot appliance. 

5.1. Using AppPot. Users receive an AppPot system image, containing a work- 
ing installation of a GNU/Linux distribution. Users have full access rights to the 
AppPot system image thus they can modify it by installing new software, libraries, 
reference data e.g., their own version of a computational code or a reference dataset 
that will be used during the computational analysys. The choice of Debian as the 
base distribution plays a role here, as there are already several thousand packages 
available in the Debian main archive, including several popular scientific codes [5]. 

There are three main usage modes of AppPot, detailed below. 

Interactive local execution. In this case AppPot is started on a local machine; 
the disk image file as well as input data are directly available. Typical use cases 
of this usage mode are code validation and appliance customization: users start 
the AppPot appliance invoking "apppot-start" from the shell passing the path to 
the disk image file, UML kernel and, if necessary, local filesystem location of input 



Indeed, in many consolidation use cases, a physical machine's resources are oversubscribed by 
starting more VMs then the underlying hardware could actually run at the same time. 
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6 BATCH-ORIENTED SOFTWARE APPLIANCES 

data. At the end of the boot process, an interactive shell is opened on the system 
console. 

Batch job on a cluster resource. This is a more common case for data analysis 
and scientific code execution. In a typical cluster setup, appliance image file, UML 
kernel and input data are made available to the batch cluster execution node; 
the apppot- start command is invoked by the batch job to run a command non- 
interactively within the AppPot appliance. 

AppPot execution is monitored through the batch job stdout: the AppPot 
startup script takes care of redirecting the appliance console output onto the stan- 
dard output of the batch job. 

Grid job. AppPot can also be executed as a grid job on a distributed infrastruc- 
ture. In this case, the disk image file, execution script and reference data need to be 
transferred to the destination node before the execution. This is normally achieved 
by specifying those input files as part of the grid job description file. Similarly to 
the batch job execution case, appliance execution on the grid can be monitored 
throught the grid job's standard output stream. 

If the base AppPot image is already deployed at the execution site, network 
traffic can be reduced by sending just a changes archive file with the differences 
between the base AppPot and the user-modified one. 

6. Real- world usage 

This section illustrates how AppPot can be used to address the issues presented 
in Section 3. As a production environment, AppPot instances have been run as 
grid jobs on the distributed and heterogeneous Swiss National Grid Infrastructure 
SMSCG. 

6.1. Deployment of complex applications. In Section 3.1, we identified three 
issues that can make application deployment on a cluster a complex sysadmin task. 

A common solution to issue (R3) is to deploy only appliances that have under- 
gone a "certification" process by the local security personnel or other trusted entity. 
AppPot solves the issue by allowing the execution of the appliance through UML; 
there is thus no need to supervise the appliances installed on the system, as they 
can gain no more privileges than the executing user already can. Actually, systems 
administrators can just provide the minimal UML infrastructure (the kernel binary, 
the slirp executable, etc.) and allow users to install their own appliances. 

Issues (Rl) and (R2) can be mitigated by virtualized appliances of any kind; the 
specific contribution of AppPot in this case is to enable the use of such appliances 
on existing batch-oriented computing infrastructures. 

In addition, AppPot allows the use of software appliance in a non-interactive 
fashion, and especially in traditional UNIX command-line scripting environments. 
In particular, this enables the integration of AppPot appliances in data analysis 
pipelines and other automated data processing systems. 

As an example, we consider the use case illustrated in [3]: ABC is a compu- 
tational chemistry code, that takes as input a Potential Energy Surface (PES) 
specification, in the form of a Fortran function. For computational efficiency, this 
PES function is compiled together with the rest of the ABC source code to form 
an executable binary specific for a certain molecule. However, the ABC build pro- 
cedure requires the G95 Fortran compiler [8], which is not part of any common 
GNU/Linux distribution. AppPot allows to create an ABC appliance by extending 
the base image with the needed G95 compiler together with the ABC source code, 
and an "auto run" script that looks for the PES file, compiles and bundles it into 
the ABC binary, and then run the resulting executable with the user-supplied pa- 
rameters. The resulting appliance provides a solution for ABC, that preserves the 
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flexibility of the original code, but has no dependencies and thus can be deployed 
and run on any GNU/Linux cluster. In particular, this has been used in the cited 
[3] to implement a grid-based analysis workflow. 

As a further example, consider the generation of plots and graphs to create 
synthetic views of the analyzed data. Traditionally, the generation of plots and 
reports has been done "offline" on desktop machines, or on separate visualization 
clusters. However, with the increase of the volume of processed data, this is no 
longer feasible, and there is a growing request to use the computing facilities to 
generate such plots and reports. But no single graphing library or system has yet 
emerged as a widely-used standard, which implies that a good fraction of users will 
not find their favorite graphing library pre-installed on the computing infrastructure 
they have access to. Using AppPot, a user can install its favourite plotting library 
and its dependencies (e.g., Python 2.7 with Matplotlib [13]) and run the post- 
processing plotting step as a regular computational job. 

6.2. Running development code. The gist of requirements (R6)-(R5) is that 
users should be able to quickly create new AppPot appliances and deploy those on 
the execution sites without the need for sysadmin support. Here we show how this 
can be accommodated by the tools provided by AppPot. 

Recall from Section 5.1 that the AppPot disk image is a regular file in the 
host filesystem, and that AppPot users have full control on it. Indeed, they can 
arbitrarily modify a running disk image using standard GNU/Linux administration 
commands in the appliance. It is thus easy to make several copies and modify each 
of them independently. This clearly satisfies requirements (R4) and (R5). 

Requirement (R7) can be addressed using the snapshot/changes implemented in 
AppPot: users can create a "changes" file that encodes the differences of the locally- 
modified appliance with a "base" one. During AppPot boot, the apppot-init script 
will re-create the modified appliance from the base one, by merging in the changes. 

The "changes" files are produced with the apppot-snap utility. It should be 
invoked a first time in order to mark a "base" system: it records the state of all files 
in the appliance. This "base" system is then deployed in a location on the execution 
cluster, shared by all users. 

When a user has finished modifying its local copy of the AppPot appliance, 
invokes apppot-snap again. Each time it is invoked with the changes subcommand, 
apppot-snap compares the state and content of each file with the recorded state, 
and creates an archive file with all changed content. 

This archive file can then be merged into a different appliance that has the same 
"base" content. Therefore, users can just send the "changes" file along with their 
grid jobs, to re-create their local AppPot environment on the remotely-installed 
one. This clearly minimizes the time to deployment of a modified appliance. 

6.3. Dynamical expansion of clusters. An AppPot-based compute node VM 
has been used in the VM-MAD to provide dynamic expansion of a computational 
cluster using virtualized computing hardware. In VM-MAD, an AppPot instance 
is submitted to the SMSCG infrastructure each time a site seeks to expand its 
resources; the AppPot instance has been customized by the site admin to be a 
replica of the standard compute node, that connects back to the home site using a 
VPN. 

This practice is generally known as "cloudbursting" [18], when the additional 
resources are drawn from a public Cloud (e.g., Amazon's EC2). 

The use of AppPot appliances as batch cluster compute nodes, allows an institu- 
tion computational cluster to be expanded using nodes from other resources in the 
same infrastructure, without its users realizing that the are using grid computing 
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at all. This is similar to the "glide-in" mechanism implemented in the Condor batch 
execution system [20]. 

7. Limitations 

Usage of AppPot has also shown some unexpected limitations and issues, mostly 
with the attempt of doing a completely user-controlled and user-initiated deploy- 
ment of VMs. 

Most issues in the actual usage of AppPot appliances originate from the way 
UML's "SKASO" mode operates: UML must ensure address space separation of 
processes running in the virtualized system, but this is normally achieved by an 
OS by using hardware-assisted protection of memory pages, which is a privileged 
operation. So the UML author found this solution: 

a. Each execution context 6 in the guest system is actually a separate process 
in the host system. 

b. For each memory page needed by the guest kernel, a page of memory is 
allocated from the host kernel. 

c. All memory pages that are shared among processes in the guest system are 
directly mapped (via a mmap ( ) call) to a segment of data in a temporary 
file. 

This effectively offloads address space separation to the host kernel (by a.) while 
still retaining the possibility of sharing portions of memory (by b. and c), which 
is necessary because all processes in the virtual system must see the same kernel 
image. 

Therefore, a UML process makes a large number of mmapO calls to the host 
system (one for each page of memory handed out by the guest kernel). As it turns 
out, the default limit on the number of mmap-ed segments of memory, bounds the 
maximum memory that a UML kernel can allocate to 256MiB, which is too low for 
any computational usage. 

In addition, all these memory pages are backed by temporary "shared memory" 
file storage; therefore, the available space on the filesystem where these files are 
created limits the total amount of memory available to AppPot/UML VMs. 

In both cases, it is a straightforward systems administration task to change the 
default so that UML Virtual Machines can allocate a larger portion of memory. 
As simple as they are, especially compared to installation and support of arbitrary 
libraries and applications, these configuration settings still represent an enabling 
step that systems administrators must perform in order to allow AppPot appliances 
on a computational cluster: users cannot do everything by themselves. 7 

A different kind of issue originates in the way batch systems enforce job mem- 
ory limits. This issue has been observed with Sun/Oracle Grid Engine 6.2 and 
PBS/TORQUE; it's very likely to happen with other batch systems as well. 

Recall that each process and thread in the UML VM is actually running as a 
process in the host system. These processes share part of their memory space (the 
whole kernel and its data structure); in the case of threads they actually share 
most of it. However, it seems that batch systems assume that much of the memory 



In the Linux kernel, processes and threads are both instances of a more general "execution 
context" concept. 

For the sake of exactness, only altering of the kernel settings is a privileged operation: changing 
the backing filesystem for UML memory can be done by any user via an environment variable. 
However, this implies that users know where a suitable filesystem is located on each cluster, 
which assumes a more detailed knowledge about the cluster setup than would be desirable, not to 
mention that in a large grid infrastructure to gather this information for each and every cluster 
is a nontrivial task in itself. 
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used by a process is private and only a negligible fraction is shared; therefore 
they wrongly reckon the total memory usage by UML processes by summing the 
memory occupation without accounting for shared pages. Hence, almost UML- 
based appliances quickly hit the memory limit and are killed by the batch system. 

Circumventing this bug is technically simple (turn off enforcement of memory 
limits for UML batch jobs), but this can be a significant policy change in the 
management of the cluster, which brings back the negotiations with the centralized 
cluster administration that UML virtualization was meant to avoid. 

While Message Passing Interface (MPI) communication among AppPot instances 
is possible (although this is matter for future development) , UML lacks support for 
Symmetric Multi-Processing (SMP), which limits its use in multi-threaded appli- 
cations. 

8. Related work 

A related attempt to use virtualization in the world of scientific computing as 
been made by CERN in order to "allow end-users to effectively use their desktops 
and laptops as for analysis and Grid interface" [2]. The CernVM virtual machine 
image can run under the Xen or KVM hypervisors and contains a minimal operating 
system installation to run the LHC experiment software as well as to function as an 
interface host for the WLCG grid infrastructure; access to the actual experiment 
software and grid middleware happens through an HTTP-based filesystem: the real 
application files are stored on a server farm at CERN. There are thus two major 
differences to AppPot: 

• The CernVM is itself a specific software appliance, not intended for cus- 
tomization or redistribution by users. 

• The way CernVM bridges the local user environment and the batch-execution 
one "goes in the reverse direction" relative to AppPot: CernVM lets users 
run programs in the same environment they find in the grid systems, 
whereas a use case for AppPot is to let the batch jobs environment be 
customized and prepared by the user. 

The use of virtualization in cluster environments has been considered in [7]; 
however, the authors approached the usage of the Xen and UML virtualization 
systems in order to provide "virtual clusters" out of a local pool of general-purpose 
hardware. A somewhat similar use case for AppPot is discussed in Section 6.3. 

The use of UML for High-Performance Computing (HPC) and distributed appli- 
cations is also the subject of [9, 17]; the aim of the authors is rather to provide "a 
middleware system [...] to support mutually isolated virtual distributed environ- 
ments in shared infrastructures", and the focus is on supporting applications that 
do not fit the batch computing or the service model. 

The "application virtualization" technique seems to be more well-established in 
the Windows operating system community (see: [1, 15]) than it is in the Linux 
one. However, the authors of this paper know of no earlier attempt to bring the 
"software appliances" concept into batch-oriented computing. 

9. Conclusions 

This paper has presented a system to bring the benefits of virtualization-based 
IaaS clouds to existing batch-oriented computing infrastructures. 

AppPot is currently used in production within the Swiss National Grid Infras- 
tructure SMSCG, supporting several use cases like those presented in this paper. 
We are collecting feedback on the effectiveness of AppPot in large-scale grid com- 
putations; we would like to stress that such effectiveness is not just a function of 
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10 BATCH-ORIENTED SOFTWARE APPLIANCES 

system performance, but should also include consideration of how it makes large- 
scale computing more accessible (on the users' side) and manageable (on the systems 
administrators side). 

Some technical improvements could spark more widespread adoption; they are 
discussed in the following subsection. 

9.1. Future development. Building on another feature of UML, namely, tunnel- 
ing all Internet Protocol (IP) traffic through a local User Datagram Protocol (UDP) 
multicast socket, it would be possible to use AppPot appliances to run MPI jobs. 
The apppot- start script would recognize a parallel environment, and launch an 
AppPot instance for every MPI rank, with the appropriate network parameters so 
that the instances can communicate over the multicast channel. The combined 
action of virtualization and tunneling could possibly be a heavy performance hit 
on the MPI subsystem, but the increase in flexibility and the other advantages of 
virtualized appliances could still make this an attractive solution in some cases. 

Finally, as explained in Section 5, the generic architecture of AppPot can easily 
be adapted to run on a variety of virtualization systems. We believe that such an 
extended AppPot can be an effective construction ingredient in mixed Grid/Cloud 
infrastructures. 
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Appendix A. List of acronyms 

API Application Programming Interface 

CERN Centre Europeen pour la Recherche Nucleaire (European Center 

for Nuclear Research) 

EC2 Amazon's Elastic Compute Cloud 

GID Group Identifier 

HPC High- Performance Computing 

HTTP Hyper Text Transfer Protocol 

KVM Kernel-based Virtual Machine 

laaS Infrastructure as a Service 

IP Internet Protocol 

LHC Large Hadron Collider 

MPI Message Passing Interface 

OS Operating System 

PBS Portable Batch System 

PES Potential Energy Surface 

POSIX Portable Operating System Interface 

SMP Symmetric Multi-Processing 

SMSCG Swiss Multi-Science Computational Grid 

TORQUE Terascale Open-source Resource and QUEue manager 

TCP Transmission Control Protocol 

UDP User Datagram Protocol 

UID User Identifier 

UML User- Mode Linux 

VM Virtual Machine 

VM-MAD Virtual Machines Management and Advanced Deployment 

(project ETHZ.7 funded by the SWITCH AAA track) 

VO Virtual Organization 

VPN Virtual Private Network 

WLCG Worldwide LHC Computing Grid 
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